A sequential test for variable selection in high dimensional complex data

نویسندگان

  • Kofi P. Adragni
  • Moumita Karmakar
چکیده

Given a high dimensional p-vector of continuous predictors X and a univariate response Y , principal fitted components (PFC) provide a sufficient reduction of X that retains all regression information about Y in X while reducing the dimensionality. The reduction is a set of linear combinations of all the p predictors, where with the use of a flexible set of basis functions, predictors related to Y via complex, nonlinear relationship can be detected. In the presence of possibly large number of irrelevant predictors, the accuracy of the sufficient reduction is hindered. The proposed method adapts a sequential test to the PFC to obtain a ‘‘pruned’’ sufficient reduction that shedoff the irrelevant predictors. The sequential test is based on the likelihood ratio which expression is derived under different covariance structures of X |Y . The resulting reduction has an improved accuracy and also allows the identification of the relevant variables. © 2014 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Monte Carlo-Based Search Strategy for Dimensionality Reduction in Performance Tuning Parameters

Redundant and irrelevant features in high dimensional data increase the complexity in underlying mathematical models. It is necessary to conduct pre-processing steps that search for the most relevant features in order to reduce the dimensionality of the data. This study made use of a meta-heuristic search approach which uses lightweight random simulations to balance between the exploitation of ...

متن کامل

Geostatistical simulation of RQD variable using investigation of spatial continuity between Quaternary alluvial layer and hard rock of Gohar-Zamin mine to the determination of permeable zones

In this research, a sequential Gaussian simulation method has been used to determine the permeable zones in the hard-rock aquifer of the Gohr-Zamin open pit mine. For this purpose, 4946 RQD data from eighty-seven exploratory boreholes was used and exploratory-spatial data analysis of these data was performed using the preliminary statistics, location maps, histograms and variograms. Results of ...

متن کامل

Feature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach

Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...

متن کامل

Applying Combined Approach of Sequential Floating Forward Selection and Support Vector Machine to Predict Financial Distress of Listed Companies in Tehran Stock Exchange Market

Objective: Nowadays, financial distress prediction is one of the most important research issues in the field of risk management that has always been interesting to banks, companies, corporations, managers and investors. The main objective of this study is to develop a high performance predictive model and to compare the results with other commonly used models in financial distress prediction M...

متن کامل

Methods for regression analysis in high-dimensional data

By evolving science, knowledge and technology, new and precise methods for measuring, collecting and recording information have been innovated, which have resulted in the appearance and development of high-dimensional data. The high-dimensional data set, i.e., a data set in which the number of explanatory variables is much larger than the number of observations, cannot be easily analyzed by ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 81  شماره 

صفحات  -

تاریخ انتشار 2015